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The ITACH@ project 

The purpose of the ITACH® project for Innovative Technologies And 
Cultural Heritage Aggregation is to provide innovative tools that 
will increase the value of the Italian cultural and tourist industries. 
The system proposed by the project, and currently in development, 
may be applied to the entirety of the information produced by cul¬ 
tural bodies and institutions such as libraries, archives, museums 
and tourist organisations and is also intended for use by similar, 
adjacent or related fields. The project aims to resolve difficulties in a 
context suffering from: 

• a lack of awareness of, and inability to meet, the sector's need 
for integrated access to data, regardless of the diversity, quan¬ 
tity, distribution or owner of the data itself; 

• the necessity for data sharing and for the data to be used (or re¬ 
used); the presence of organisations or individuals choosing to 
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share data and who can benefit from fhe creation of organised 
and accessible 'ecosysfems'. 

The fundamenfal questions fo be asked are: 

• whaf is fhe besf way of providing access fo dafa so thaf if may 
be easily reused? 

• how can the discovery of pertinent data from within a mass of 
available information be made possible? 

• how can applications be made to integrate data from heteroge¬ 
neous and unknown sources? 

These issues place the ITACH® project within the larger setting of 
fhe semantic web, raising questions regarding the publication of 
dafa in accordance wifh fhe field's sfandards for good practice and 
fechnological declinafions, such as linked dafa. 


The OpLiDaF platform 

In particular, we will concenfrafe on one of fhe sysfem's fechnolog¬ 
ical components, the Open Linked Data Framework, or OpLiDaF, 
drawn up as a framework for fhe creafion, sfrucfurizafion and visu- 
alizafion of dafa in Resource Descripfion Framework (RDF)/XML 
format. It is intended to be a specialist platform for the treatment (for 
example mapping, conversion, cleansing and publication) of linked 
dafa for heferogeneously formaffed dafa, fhrough ad hoc fools and 
procedures, or integrated open source systems, and through the use 
of sfandards and languages recognised by fhe semanfic web. 

The main funcfions of fhe OpLiDaF plafform are: 

• fhe selection of onfologies; 
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• mapping between the data of origin and ontology, or selected 
ontologies; 

• the creation of specific onfologies from wifhin a sef of dafa; 

• fhe producfion of RDF/XML files; 

• dafa cleansing. 


The OpLiDaF system is based on the observation of the composition 
and typology, despite differences in bofh confenf and formaf, of fhe 
dafa forming the body of information of libraries, archives, muse¬ 
ums, fourisf and regional organisations and ofher insfifufions. We 
could argue that the list quoted shows a decreasing trend in compar¬ 
ison with the use of recognised sfandard formafs across fhe board: 
from libraries, fhese being wifhouf doubf fhe insfifufions fhaf have 
mosf used sfandards for fhe sfucfurizafion and publicafion of fheir 
own dafa in fhe pasf, fo secfors in which dafa is collafed in Access, 
Excel or CSV spreadsheefs. The libraries fhemselves, fronf-runners 
in sfandardizafion, especially in fhe widespread Machine Readable 
Cafaloguing (MARC), formafs, cormecf fhis dafa, relative mainly fo 
bibliographic descriptions and aufhorify files, wifh a range of ofher 
dafa in differenf formafs, more commonly managemenf-based dafa 
such as user profiles, lending and reservation data, acquisitions data, 
or descriptive and administrative data for periodicals and serials, 
which are often managed, for ease, convenience or tradition, outside 
of fhe cenfralised bibliographic dafabase. This heferogeneous and 
facetted composition of information sources becomes even more 
evident the more one moves away from fradifional library confexfs 
fo wards museums and archives. 
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The publication of linked data from 
relational databases 

Analysis of this heterogeneous variety of data, much of which is of 
great public interest, is accompanied by the awareness that, were 
this data to be converted into linked data, according to recognised 
and now widespread principles, standards and practices, neither 
the respective native data management systems, nor business appli¬ 
cations, would be abandoned; we would merely see the addition of 
a supplementary technological layer in the linking of this data to the 
semantic web. 

The diagram in figure 1 on the facing page allows us to analyse 
a possible work flow for the publication of heterogeneous data in 
linked data. 

Without losing ourselves in different work flow hypotheses, we will 
focus on the high potential, through different paths and tools, for 
the transformation of data for the semantic web (both structured 
data and textual data, another vast wealth of information that is 
rarely taken advantage of in the traditional web, in relation to its 
high information potential), with the interesting scenario that we 
find in relation to the use (and reuse) of data, without necessarily 
intervening in the legacy systems being used by the organisations 
(we define as legacy the existing information systems or an applica¬ 
tion that continues to be used because the user cannot, or will not 
replace it). 

The politics and practices of data publication on the semantic 
web vary depending on various factors, including: 

• the original format of the data (structural or textual); 

• the amount of data to be included in a data set; 

• the frequency of data updates. 
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Figure 1: Workflow of the publication of heterogeneous data in linked data. 
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OpLiDaF concentrates in particular on the first and final of fhe 
fhree facfors above, relafive fo fhe differing sfrucfure of fhe original 
dafa and fhe need for updafes, relying on a fechnological mefhod- 
ology fhaf produces a fransverse layer infended fo direcf and co- 
ordinafe fhe differenf managemenf requiremenfs for this data. If 
we focus on fhe library secfor, we cannof avoid fhe freafmenf of 
dafa in MARC formaf (in parficular, from MARC21 fo RDF/XML). 
If is a known process, supporfed by a vasf liferafure, and may be 
considered as fhe library's first step towards publishing its own data 
on the semantic web. We prefer, fherefore, fo deal wifh a less busy 
field fhan fhaf of conversion from MARC21, and will pinpoinf fhe 
procedures and fechniques for fhe freafmenf of dafa confained in 
relafional dafabases, in order fo analyse fhe pofenfial of fhe Open 
Linked Dafa Framework (OpLiDaF) sysfem, which uses recognised 
sfandards and mapping language. Much sfrucfured bibliographic 
dafa in MARC21 is saved in fhe memory of relafional dafabases, 
allowing fhe dafa fo be recomposed in MARC formaf during expor- 
fafion or in cases of exfernal access fo fhe dafa (for example, by a 
Z39.50 client). The exercise and study on the translation of data from 
relafional dafabases fo linked dafa is of parficular inferesf for bofh 
bibliographical dafa and aufhorify files, as fhis is fhe relafional rep- 
resenfafion of fhe separafe ifem of dafa in MARC. The publicafion 
of dafa from relafional dafabases as linked dafa is greafly facilifafed 
by fhe fools now available, which use mapping processes from fhe 
relafional dafabases in RDF graphs, before publishing on fhe web 
according fo fhe principles of linked dafa. This possibilify becomes 
all fhe more inferesfing if consider abouf fhe enormous amounf of 
infernal managemenf dafa, produced and saved in legacy sysfems 
and nof necessarily desfined for fhe web as an open and public 
space, but, for example, for company intranets: the same technology 
as linked data may be destined for infernal use buf jusf as useful 
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and necessary for the controlled diffusion of existing information. 
The W3C RDB2RDF Working Group is working on the elaboration of 
standard languages for the mapping of relational data and outlines 
of relational databases in RDF and Web Ontology Language (OWL): 
the two main languages available to date are Direct Mapping (DM) 
and the RDB2RDF Mapping Language (R2RML). From a technologi¬ 
cal viewpoint, one of the most widespread and widely-used tools 
for the publication of relational databases on the semantic web is 
the D2R Server, which allows RDF and HTML browsers to navigate 
database contents using SPARQL as a search language. 

These are widely recognised standards and technologies for the 
semantic web, but we are most interested in demonstrating the 
potential of another mapping language in outlines of relational and 
ontological databases implemented in RDF(S) or OWL, and used in 
the OpLiDaF platform: R20 (Relational to Ontology), which allows 
us to produce a wide-reaching set of primitives with an explicit and 
recognised semantic. R20 is a high level language separate from the 
RDBMS (in our case, Oracle), and works with databases that use the 
SQL standard. R20 is based on D2R, but aims to overcome the two 
main limits of the latter: 

• R20 is more powerful and flexible, and therefore more suitable 
for the development of complete mapping, providing and a 
level of expression that DR2 lacks; 

• R20, unlike D2R, is a demonstrative language (that is, it allows 
us to specify what we want to obtain, without describing how 
to arrive at the result). 

A supposition regarding the use of R20 is that the database and 
the ontology (implemented in OWL/RDF) are very similar in struc¬ 
ture, assuming that both the database and the ontology are pre¬ 
existent and do not require modifications to be used. To demonstrate 
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the generational flow of RDF data sets from a relafional dafabase 
wifhin fhe OpLiDaF plafform, we have selecfed, wifh fhe aim of 
mapping, existing ontologies, rafher fhan generafing fhe onfology 
semi-aufomafically from fhe relafional dafabase (anofher possible 
sfrafegy fhaf is very useful in confexfs where usable onfologies are 
nof available). The onfologies used in fhis sfudy are: 

• Bibtex = http:/ /bibotools.googlecode.com/svn/bibo-ontology/ 
tags/1.3/bibo.xml.owl 

• Bibo = http://zeitkunst.org/bibtex/0.2/bibtex.owl 

The relational database is Oracle, which contains bibliographic data 
and is structured, in short, in two different views, created in order 
to map two different ontologies: BOOK (mapped on the bibtex 
ontology) and PARTS (mapped on the bibo antology). Other tools 
used for fhe sfudy include: 

• an open source and mulfi-plafform planning environmenf for 
ontologies. If is based on fhe Eclipse developmenf plafform 
and offers numerous plug-ins fhaf are useful in covering a 
wide variefy of functions linked to fhe life-cycle of onfologies; 
one such plug-in is ODEMapster. Neon ToolKit was developed 
as parf of fhe "NeOn" projecf^ and is supporfed by fhe NeOn 
Foundation;^ 

• ODEMapsfer: plug-in for fhe Neon ToolKit: allows for guided 
and exfremely simple mapping operafions between relafional 
dafabase fables and fhe selecfed onfology, as shown in fhe 
below illusfrafion, which demonsfrafes fhe mapping phase of 
fhe BOOK view in fhe bibfex onfology. 

^http: / /www.neon-project.org. 

^http: / / www.neon-foundation.org. 
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Each item of data present in a column of the selected table may be 
mapped with a class or attribute that has been carefully selected in 
the ontology used. 



Figure 2 


The left-hand section of the figures 2 and 3 on the following page 
shows the list of ontologies used or available for use (among these, 
we can see also the FOAF - Friend Of A Friend) vocabulary, which 
has been included, but was not used in this trial). The left-hand sec¬ 
tion of the central part of the screen shows the fields that we intend 
to map in the selected ontology; the ontologies, in turn, are shown 
in the right-hand section of the central part of the screen, where 
BOOK represents the class and the yellow dots are the attributes. 
The selection of database fields to be mapped with the ontology's 
attributes depends on the institution's willingness to publish and 
share this data. In our case, we have carried out a simple example 
of mapping: 

• AUTHOR field: Bibtex.hasAuthor 
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Figure 3 


• TITLE field: Bibtex.HasTitle 

• PUBLISHER field: Bibfex.hasPublisher 

• NOTE field: Bibfex.hasNofe 

• LANG field: Bibfex.hasLanguage 

The phase following mapping befween fhe relational dafabase and 
fhe onfology is fhe producfion of R20 files: fhe XML fhaf describe 
fhe graphic mapping between dafabase and onfology in language 
form. This is required by ODEMapsfer fo generafe fhe RDE. 

Listing 1: Small section of RDF generated by ODEMapster 

<?xml version="l.0” encoding="UTF-8"?> 

<r20> 

<dbschema-desc name=’'AMISV2"> 

<has-table name="PARTl "> 

<has-table name="B00K''> 
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<nonkeycol-desc name='' AUTHOR" /> 

<nonkeycol-desc name=''PLACE" /> 

<nonkeycol-desc name=''ID'' /> 

<nonkeycol-desc name=''PUBLISHER” /> 

<nonkeycol-desc name=''N0TE" /> 

<nonkeycol-desc name=”VOLUME" /> 

<nonkeycol-desc name="LANG" /> 

<nonkeycol-desc name=”TITLE" /> 

</has-table> 

</dbschema-desc> 

<conceptmap-def name=''http: //purl. org/net/nknouf/ns/bibtex 
#Book”> 

<uri-as type=''DEFAULT''> 

<operation oper-id="concat"> 

<arg-restriction on-param="stringl "> 

<has-value>http ://purl.org/net/nknouf/ns/bibtex# 
Book</has-value> 

</arg-restriction> 

<arg-restriction on-param=”string2''> 
<has-colunnn>AMISV2 . BOOK. AUTHOR</has-column> 
</arg-restriction> 

</operation> 

</uri-as> 

<default_uri-as> 

<operation oper-id="concat"> 

<arg-restriction on-param="stringl "> 

<has-value>http ://purl.org/net/nknouf/ns/bibtex# 
Book</has-value> 

</arg-restriction> 

<arg-rest riot ion on-param=''string2''> 
<has-colunnn>AMISV2 . BOOK. AUTHOR</has-column> 
</arg-restriction> 

</operation> 

</default_uri-as> 
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<described-by> 

<attributerrap-def name=''http: //purl.org/net/nknouf/ns/ 
bibtex#hasLanguage"> 

<selector> 

<aftertransforrr> 

<operation oper-id=’'constant”> 

<arg-restriction on-pa ram=" const-val"> 
<has-column>AMISV2. BOOK. LANG</has-column> 
</arg-restriction> 

</operation> 

</aftertransform> 

</selector> 

</attributemap-def> 

<attributemap-def name='' http://purl.org/net/nknouf/ns/ 
bibtex#hasAuthor’'> 


Thirdly, the system interrogates the database, extracts the records 
and maps them in RDF format according to the guidelines estab¬ 
lished in the previous phases. 


We include in listing 3 on page 286 an extract of an RDF file, to assist 
reading. 


Listing 2: Extract of an RDF file 

<rdf:RDF 

xmlns : rdf=”http: //www.w3.org/1999/02/22-rdf-syntax-nsX#" 
xmlns : j .0=”http://purl.org/net/nknouf/ns/bibtexX#" > 

<rdf:Description rdf : about=” http://purl.org/net/nknouf/ns/ 
bibtex\#BookGoni\%2C_Enrico"> 

< j .0: hasVolume> </ j .0: hasVolume> 

<j.O:hasPublisher>All’insegna del Veltro</j.0: hasPublisher 
> 

<rdf : type rdf: resource="http ://purl.org/net/nknouf/ns/ 
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Figure 4 


bibtex\#Book’V> 

<j . 0: hasLanguage>ita</j .0: hasLanguage> 

<j .0: hasAuthor>Goni , Enrico</ j .0; hasAuthor> 

<j.0:hasNote>92 p. ; cl 7hcnn. </ j . 0: hasNote> 
<j.O:hasTitle>Nietzsche e 1’evoluzionismo /</ j .0: hasTitle> 
</rdf : Description> 

<rdf:Description rdf : about=" http://purl.org/net/nknouf/ns/ 
bibtex\#BookFestini_Cucco\%2C_Wally"> 

<j .0: hasLanguage>ita</j .0: hasLanguage> 
<j.O:hasAuthor>Festini Cucco, Wally</j.0: hasAuthor> 

<rdf : type rdf: resource=”http: //purl.org/net/nknouf/ns/ 
bibtex\#Book'’/> 

<j .0: hasNote>1 61 p. ; c22 cm. </j .0: hasNote> 

< j .0: hasPublisher>Angeli</ j .0: hasPublisher> 
<j.O:hasTitle>Psicologia degli scacchi : bsimboli e affet- 
ti /</j.O:hasTitle> 

<j.0:hasVolume> </j .0: hasVolume> 

</rdf : Description> 
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<rdf: Description rdf ; about='' http: //purl. org/net/nknouf/ns/ 
bibtex\#BookDibenedetto\%2C_Giuseppe''> 

<j .0: hasAuthor>Dibenedetto, Giuseppe</ j .0: hasAuthor> 
<j.O:hasTitle>Lineamenti di archivistica /</ j .0: hasTitle> 
< j .0: hasLanguage>i ta</ j .0: hasLanguage> 

<j. 0: hasPublisher>Levante</j .0: hasPublisher> 

<rdf : type rdf:resource=" http://purl.org/net/nknouf/ns/ 
bibtex\#Book'V> 

<j.0:hasNote>373 p. ; c24 cm.</ j .0: hasNote> 

< j .0: hasVolume> </ j .0: hasVolume> 

</rdf : Description> 

<rdf:Description rdf : about='' http://purl.org/net/nknouf/ns/ 
bibtex\#BookGrasso\%2C_Agata_Rita''> 

< j. 0: hasLanguage>i ta</ j. 0: hasLanguage> 

<j. 0: hasTitle>Le difficolta di apprendimento: guida 

bibliografica : testi per gli alunni e volumi per gli 
insegnanti/</ j .0: hasTitle> 

<j. 0: hasAuthor>Grasso , Agata Rita</ j .0: hasAuthor> 

<rdf : type rdf: resource="http: //purl.org/net/nknouf/ns/ 
bibtex\#Book"/> 

<j.0:hasNote>94 p. ; 24 cm.</ j .0: hasNote> 

< j .0: hasVolume> </ j .0: hasVolume> 

<j. 0: hasPublisher>Edizioni del cerro</ j .0: hasPublisher> 
</rdf : Description> 


The RDF may be viewed alongside the content of the relational 
database, as illustrated in figure 5 on fhe facing page. 


Data cleansing 

Furfher analysis of fhe RDF files produced shows cerfain limifs 
and errors in fhe resulf fhaf confrasf wifh fhe resulf infended by 
fhe principles of linked dafa production. Such cases are illusfrafed 
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Figure 5 


below. We must note that, in our case, some of these errors result 
from the lack of content in the data used and therefore to the low 
availability of expression: 

• the file presents a relatively low number of assertions and rela¬ 
tions between entity and entity (in the example we reproduced, 
the only relation is with the Type entity); 

• the majority of the assertions have literals as their objects, 
making the RDF resources "bad" and isolated: the author of 
our example should be an autonomous entity, with a Uniform 
Resource Identifier (URl) reference, and not a literal, therefore: 
non <j.0: hasAuthor>Goni, Enrico</ j . 0: hasAuthor> ma 

<j.0:hasAuthor rdf:resource^ 
http://atcult.it/autori/283235467/> 

• some cases show separating characters with relative sub-field 
codes, inherited from the data structurization saved in the 
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Oracle tables (these are sub-field codes that are present in the 
MARC21 record produced in the cataloguing phase), as in 
the example <j .0:hasNote>94 p. ;c24 cm.</j.0:hasNote> 
where the sub-field code $c of fhe record's 300 fag is presenf, 
before fhe field relafing fo fhe resource's dimensions. 

• some assertions are invalid, as these do not have a object 
and therefore cannof be espressed as friples (which musf be 
composed as subjecf-predicafe-objecf). 


Listing 3: Example of invalid RDF triple. The question relative is: who is the 
author? 

<rdf:Description 

rdf : about=http : //purl . org/net/nknouf/ns/bibtex\# 
BookDibenedetto\%2C Giuseppe> 

< j . 0; hasAuthorx/j . 0: hasAuthor> 

</rdf:Description> 

On fhe basis of fhis analysis of fhe RDF file produced in OpLiDaF, 
a series of procedures may be acfivafed to arrive at what can be 
defined fhe phase of dafa cleansing, including: 

• fhe use of cleansing fools fo eliminafe easily identifiable dirty 
characters, such as the sub-field codes of MARC21 fags; 

• the identification of friple scarming processes for validify con- 
frol; 

• the drawing up of control procedures and the identification of 
liferal friples in confrasf fo RDF friples; 

• the automatic creation of entities that may be identified by URI 
through the use, for example, of unambiguous idenfifiers, in 
the majority of cases already present in the relational databases, 
or created according to established criteria. 
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In terms of data sharing, quality is of utmost importance and must 
be a fundamental characteristic for the selection of any data set 
produced by third parties wishing to share and link their own data 
to this. 


OpLiDaF and the life-cycle of linked data 

To conclude, we offer a summary, starting from the life-cycle of 
linked data which may be sub-divided into various steps and that 
we have divided into seven steps ("Methodological Guidelines for 
Publishing Government Linked Data"), what the OpLiDaF platform 
is able to cover: 

1. identification of data source; 

2. modeling of vocabulary; 

3. generation of data in RDF format, through the different avail¬ 
able mapping languages; 

4. publication of the data in RDF; 

5. cleansing of the data produced; 

6. creation of links between different data sets; 

7. making available data, with different steps, including the pub¬ 
lication of the data set obtained by the process on the GKAN 
Registry (Comprehensive Knowledge Archive Network). 

The platform appears to be able to completely satisfy steps 2 to 5 
and constitutes a useful tool for whoever wishes to produce linked 
data (regardless of the management system, the data format, the 
size of the data set and of the mode and frequency of updates). 
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resolving a part of the obstacles and problems that the passage from 
fhe fradifional web fo fhe semanfic web may pose. 
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